23 research outputs found

    Dwarfs on Accelerators: Enhancing OpenCL Benchmarking for Heterogeneous Computing Architectures

    Full text link
    For reasons of both performance and energy efficiency, high-performance computing (HPC) hardware is becoming increasingly heterogeneous. The OpenCL framework supports portable programming across a wide range of computing devices and is gaining influence in programming next-generation accelerators. To characterize the performance of these devices across a range of applications requires a diverse, portable and configurable benchmark suite, and OpenCL is an attractive programming model for this purpose. We present an extended and enhanced version of the OpenDwarfs OpenCL benchmark suite, with a strong focus placed on the robustness of applications, curation of additional benchmarks with an increased emphasis on correctness of results and choice of problem size. Preliminary results and analysis are reported for eight benchmark codes on a diverse set of architectures -- three Intel CPUs, five Nvidia GPUs, six AMD GPUs and a Xeon Phi.Comment: 10 pages, 5 figure

    Associated Legendre Polynomials and Spherical Harmonics Computation for Chemistry Applications

    Full text link
    Associated Legendre polynomials and spherical harmonics are central to calculations in many fields of science and mathematics - not only chemistry but computer graphics, magnetic, seismology and geodesy. There are a number of algorithms for these functions published since 1960 but none of them satisfy our requirements. In this paper, we present a comprehensive review of algorithms in the literature and, based on them, propose an efficient and accurate code for quantum chemistry. Our requirements are to efficiently calculate these functions for all non-negative integer degrees and orders up to a given number (<=1000) and the absolute or the relative error of each calculated value should not exceed 10E-10. We achieve this by normalizing the polynomials, employing efficient and stable recurrence relations, and precomputing coefficients. The algorithm presented here is straightforward and may be used in other areas of science.Comment: The 40th Congress on Science and Technology of Thailand (STT40

    Efficient update of ghost regions using active messages

    Get PDF
    The use of ghost regions is a common feature of many distributed grid applications. A ghost region holds local read-only copies of remotely-held boundary data which are exchanged and cached many times over the course of a computation. X10 is a modern par

    PGAS-FMM: Implementing a distributed fast multipole method using the X10 programming language

    Get PDF
    The fast multipole method (FMM) is a complex, multi-stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchr

    AIWC: OpenCL-Based architecture-independent workload characterization

    No full text
    Measuring performance-critical characteristics of application workloads is important both for developers, who must understand and optimize the performance of codes, as well as designers and integrators of HPC systems, who must ensure that compute architectures are suitable for the intended workloads. However, if these workload characteristics are tied to architectural features that are specific to a particular system, they may not generalize well to alternative or future systems. An architecture-independent method ensures an accurate characterization of inherent program behaviour, without bias due to architecture-dependent features that vary widely between different types of accelerators. This work presents the first architecture-independent workload characterization framework for heterogeneous compute platforms, proposing a set of metrics determining the suitability and performance of an application on any parallel HPC architecture. The tool, AIWC, is a plugin for the open-source Oclgrind simulator. It supports parallel workloads and is capable of characterizing OpenCL codes currently in use in the supercomputing setting. AIWC simulates an OpenCL device by directly interpreting LLVM instructions, and the resulting metrics may be used for performance prediction and developer feedback to guide device-specific optimizations. An evaluation of the metrics collected over a subset of the Extended OpenDwarfs Benchmark Suite is also presented
    corecore